A robust and resilience machine learning for forecasting agri-food production

This research proposes a new framework for agri-food capacity production by considering resiliency and robustness and paying attention to disruption and risk for the first time. It is applied robust stochastic optimization by adding robustness to the constraint's objective function and resiliency situation. This research minimizes the mean absolute deviation and coefficient of standard deviation errors by linear function in the agri-food capacity production. This study suggests agri-food managers and decision-makers use this mathematical method to forecast and improve production management. The results of this research lead to better decision-making and are compared with other sine functions. The main model's Robust and Resiliency Mean Absolute Deviation (RRMAD) value is 1.28% lower than other sine-type functions. The conservativity coefficient, confidence level, weight factor, resiliency coefficient, and probability of the scenario vary. The main model's RRMAD value is 1.28% lower than other sine-type functions. Growing the weight factor will result in an increase in RRMAD and a smooth decline in R-squared. Additionally, as the resilience coefficient rises, the RRMAD function increases while the R-squared declines. By altering the probability of the scenario, the RRMAD function drops, and the R-squared goes up.

Simple model. This section examines the simple model for forecasting using ML. Simple models only utilize one method to predict and show trends. Kantasa-Ard, Nouiri 14 utilized ML for demand forecasting on the physical internet. They embedded this model in agricultural products in Thailand. They used a Long Short-Term Memory (LSTM) with a hybrid Genetic Algorithm (GA) and Scatter Search (SS) to tune the parameters of the LSTM. Pereira and Cerqueira 15 applied ML regression methods for forecasting hotel demand to manage revenue. They employed 22 methods for determining short-term demand forecasting with a 14-day lead time. They proposed Arbitrating ML as a meta-learning approach by combining the dynamic ensemble method. They found that using the ML method decreased the mean square error by 54%.
Kohli, Godwin 16 applied a linear and KNN regression for forecasting sales. Using this method, they can predict sales and plan for resilience against disruption and fluctuation. Ali et al. 17 measure green supply chain management's environmental and sustainable impact by using manufacturing organization survey data. The authors developed a sustainability framework using machine learning-based CHAID analysis to reduce environmental damage and improve the organization's business performance. However, they used the PLS-SEM package with 380 data responses from various manufacturers. Item Response Theory post hoc analysis is used to confirm the scope and effectiveness of the measurement model after additional robustness of the proposed model is validated using various ML (machine learning) techniques 18 . Papacharalampous and Langousis 19 presented Quantile Regression Algorithms (QRA) for water demand forecasting. They used probabilistic thinking to cope with uncertainty and compared the method with quantile regression, linear boosting, generalized random forest, gradient boosting machine, and quantile regression neural network algorithms. In addition, They applied this model to urban water flow.
BV and Dakshayini 20 applied machine learning tools to the project market, used Multiple Linear Regression (MLR) and an ANN model, and attempted to forecast demand in agriculture. They found the proposed helpful model reliable and quiet for planning and producing agri-food. Baryannis, Dani 21 surveyed supply chain risks and presented the ML approach to predict supply chain risks. They establish to define a trade-off between performance and interpretability. They used Data-Driven Artificial Intelligence (AI) techniques to estimate supply chain risks. Lotfi, Kheiri 22 developed a novel approach based on robust regression to predict the number of patients with COVID-19 in Iran. They utilized robust convex optimization and Mean Absolute Deviation (MAD) to forecasting patients of COVID-19. They compared the model with the well-known model and showed that the new model's performance was better than the previous model.  For demand forecasting at the retail stage for a few vegetables, Priyadarshi, Panigrahi 26 used a Box-Jenkinsbased auto-regressive integrated moving average model along with ML-based algorithms like Long Short-Term Memory (LSTM) networks, SVR, random forest regression, Gradient Boosting Regression (GBR), and extreme GBR (XGBoost/XGBR).
Kilimci, Akyuz 27 proposed a deep learning strategy and decision integration approach for demand forecasting in the supply chain. They proposed an innovative method based on Deep Learning (DL) techniques, the SVR algorithm, and time series and employed this methodology to forecast demand.
Phyo and Jeenanunta 28 demonstrated daily load forecasting using a combination of Classification and Regression Tree (CART) and Deep Belief Network (DBN). They applied this model for load data in the Electricity Generating Authority of Thailand (EGAT). Yucesan, Pekel 29 proposed regression, time series, and ML-based methods for forecasting daily natural gas consumption. They applied the Seasonal Autoregressive Integrated Moving Average with Exogenous Regressors (SARIMAX) and Artificial Neural Networks (ANN). A novel ML approach for demand forecasting and supply chain performance was developed by Feizabadi 30 . In this research, ARIMAX and NN are developed and applied by steel manufacturers.
Research gap. Based on the application of the ML approach, the relevant works are organized and reviewed in Table 1. As is evident, our goal is to design RRML, which has not yet been developed. The most pertinent research in the literature is categorized in Table 1, along with comparisons of the methodology, case study(s) or scenarios, and goal.
Given the research gap in Table 1, the main novelty of this study is RRML for predicting production in the future and considering supply problems. In other words, it is necessary to design a model to predict production under its complicated uncertainty that can be efficiently utilized in future decision-making processes. Figure 2 is a flow chart that is drawn to describe the research methodology and the steps of the suggested method. The significance of this research can be summed up as a novel approach for production projection called Robust and Resilience ML (RRML), which considers hard and complex conditions for disruption resilience by using a robust regression approach.
The contribution of this research is as follows: • A new Robust and Resilience ML (RRML) approach, • Projecting production by a new robust regression approach, • Considering hard and complex conditions for resilience against disruption.

Problem description
This study attempt to forecast the quantity of agri-food production based on the years. The aim of research plan and forecast agri-food production until those in charge of decision-making in the agri-food sector can enable good decisions and define policy. Despite data uncertainty, the forecast between year ( x i ) and production ( y is ) for various scenarios is estimated (cf. Fig. 3). Although there is uncertainty in data, it is estimated to forecast production ( y ′ is ) under scenarios. This section introduces the proposed model for long-term forecasting. Therefore, relative years and number of production are considered for projecting the production. This research uses robust stochastic optimization to predict the volume of production in Agri-food production. Eventually, it is suggested RRML based on this scope: Robust approach: Robust stochastic programming for regression-based, Resiliency: considering the resiliency coefficient depend on the scenario as a resiliency approach against disruption.
Mathematical model. Consequently, it is necessary to make the following assumption: Assumption: • There is a stochastic nature (uncertainty) in the data.
• There is no dependence between the data.  The objective function (1) tries to minimize the RRMAD. The RRMAD includes the mean absolute deviation and coefficient of standard deviation between real and forecasted production in all years. Constraint (2) represents the deviation between real and forecasted production in year i under scenario s. Constraint (3) states the mean forecasted production in year i. Constraint (4) states the standard deviation of forecasted production  (5) shows the amount of forecasted production in year i under scenario s. Constraint (6) presents linear regression function that must be fitted. Therefore, polynomial regression is suggested. Constraint (7) considers the summation of weight factors must be one. Constraint (8) Linearization of RRML: subject to Complexity of the problem. The complexity of linearization of RRML includes numbers of binary, positive and free variables and constraints as indicated in Eqs. (14) to (17). As can be seen, one of the essential factors for constraints, positive and free variables, is scenario sets. Positive, free variables and constraints are linear in the relation between scenarios: This model has no binary variable and is completely LP. As a result, the large scale of this problem is solved in polynomial time. Consequently, increasing scenarios make to increase time polynomially.
Correlation coefficient of the proposed model. After estimating the parameters of polynomial regression, it is needed to measure dependency and quality of response. Then, the scenario-based correlation coefficient ( R 2 ) is employed to measure the quality of response (RRMAD). Finally, the scenario-based correlation coefficient is calculated according to Eqs. (18), (19): Comparing with other functions. To compare the proposed model's performance, the function type is replaced with the sine type function for constraint (6). Changing the constraint is still the model becomes Linear Programming (LP). As a result, this type of function is generated to control the performance of the main model: Constraints number = 5|I||S| + 2|I| + 1.

Results and discussion
This research's case study concerns agri-food in Iran. Through conversations with agricultural managers, the value of the parameters was determined and is presented in Table 2. The configuration that is applied for solving models with GAMS (CPLEX solver) is as follows: Intel(R), Core(TM) i5-4210U, CPU @ 1.70 GHz, 2.40 GHz, 6.00 GB RAM, and a 64-bit operating system. The number of sets is determined in Table 3. The probability of scenario includes pessimistic, possible, and optimistic scenarios with the same value. So, the volume of agri-food production with uncertainty is shown in Iran (cf. Fig. 4). After obtaining the optimal solution for the model, The RRMAD function is 4.698 in Table 3, and the final function coefficients are determined in Table 4 and Fig. 5. Finally, it is obtained optimal polynomial regression with degree seven in Fig. 6.
Type 2 Sine type function 1 ∀i, s, Type 4 Sine type function 3 ∀i, s,     www.nature.com/scientificreports/ Comparing models. In this section, the main model is compared with other sine types that are defined in section "Comparing with other functions". The amount of RRMAD and R-squared is determined in Table 5 and Fig. 7. As shown, the value of RRMAD of the main model is 1.28% less than other sine types. This mathematical model is better than linear and polynomial degree two regression in RRMAD and R-squared.
Analyzing the conservativity coefficient. The conservativity coefficient ( β ) is the preference of decision-makers. It is varied in the range of 95%-100%. When this factor increases to 100%, the RRMAD function decreases in Table 6 and Fig. 8 and if this factor increases by 5%, the RRMAD function will change by about −1.05%, and R-squared will fluctuate in Fig. 9, too.   www.nature.com/scientificreports/ Analyzing the confidence level. The confidence level of decision-maker is denoted by confidence level ( α ). It is varied between 1 and 5%. If it decreases, the cost function will change to up (cf. Table 7 and Fig. 10). By reducing it to 1%, the cost function grows 0.70%, and R-squared is not changed significantly.

R-Squared
Conservativity coefficient (Beta) Conservativity coefficient (Beta) R-squre Figure 9. R-squared for conservativity coefficient. www.nature.com/scientificreports/ Analyzing weight factor for old and new data. The weight factor ( ) is a significant factor for each data. This weight factor is changed between 0 and 100%. When the significant factor is 0%, the substantial factor for all data are w i = |I|−(i−1) |I|(|I|+1)/2 . It means that old data is more important than new data. When the weight factor is 100%. The significant factor of all data is w i = i |I|(|I|+1)/2 , that new data are more important than old data. If this coefficient grows, the RRMAD will increase, and R-squared will move down smoothly (cf. Figs. 11, 12, and Confidence level (Alpha) RRMAD Figure 10. Analyzing the confidence level. www.nature.com/scientificreports/ Analyzing the resiliency coefficient. The resiliency coefficient ( ρ s ) as a significant factor for the resiliency situation in the proposed model is analyzed. The RRMAD function increases and the R-squared decreases by increasing the resiliency coefficient (cf. Table 9). When the resiliency coefficient increases by 5%, the RRMAD function rises by 3.69% (cf. Figs. 13 and 14).
Analyzing the probability of scenario. The probability of scenario ( p s ) as the probability occurring is analyzed in the regression model. The RRMAD function moves down, and R-squared increases by changing the scenario possibility (cf. Table 10). When the scenario possibility increases by 67%, the RRMAD function moves down by 53%, and R-squared moves up by 5.6% (cf. Figs. 15 and 16).

Discussion
This study examined a RRML for forecasting agri-food production and is the first to combine the concepts of robustness and resiliency for this problem. To deal with uncertainty, this study employed a scenario-based approach. Furthermore, this problem considers disruption-based flexibility as resiliency in ML for forecasting and compares the proposed model to other functions to demonstrate the model's performance. After solving the model, the model obtains the coefficient of the proposed function in the RRML approach. The proposed model is compared with other sine-type functions and found that the model's performance is better than types of sine functions, and RRMAD is less than them. Eventually, by embedding robustness and resiliency concepts, this research considers uncertainty that did not pay attention to previous research in the ML  22 robustness and resiliency concepts with a scenario-based approach. Resiliency concepts were not considered in previous work, but this research considers this concept to survey uncertainty disruption in the ML model. In addition, the main model is compared with sine types that are defined in section "Comparing with other functions". The amount of RRMAD and R-squared is determined in Table 5 and Fig. 7. As can be seen, the value of RRMAD of the main model is 1.28% less than other sine-type. The conservativity coefficient is varied in the range of 95-100%. When this factor increases to 100%, the RRMAD function decreases in Table 6, Fig. 8. When this factor increases by 5%, the RRMAD function will change by about −1.05%, and R-squared will fluctuate in Fig. 9, too. The confidence level is varied between 1 and 5%. If it decreases, the cost function will change to up (cf. Table 7 and Fig. 10). By reducing it to 1%, the cost function grows 0.70%. As can be seen, R-squared is not changed significantly. The significant factor is changed from 0 to 100%. When

R-Squared
Resiliency coefficient Resiliency coefficient R-squre Figure 14. R-squared for resiliency coefficient.  Figure 15. RRMAD for the probability of scenario. www.nature.com/scientificreports/ the significant factor is 0%, old data is more important than new data. When the significant factor is 100%, new data is more important than old data. If this coefficient grows, the RRMAD will increase, and R-squared will move down smoothly (cf. Figs. 11, 12, and Table 8). The resiliency coefficient is analyzed as a significant factor for the resiliency situation in the regression model. The RRMAD function increases and R-squared decreases by increasing the resiliency coefficient (cf. Table 9). When the resiliency coefficient increases by 5%, the RRMAD function rises by 3.69% (cf. Figs. 13 and 14). The scenario probability is analyzed as the probability of occurring in the regression model. The RRMAD function moves down, and R-squared increases by changing the scenario's probability (cf. Table 10). When the scenario probability increases by 67%, the RRMAD function moves down by 53%, and R-squared rise by 5.6% (cf. Figs. 15 and 16). Therefore, sensitivity analyses are run for essential parameters. As a result, it is suitable to embed robustness and resiliency concepts for this problem because these concepts improve the model's performance and make the model robust and resilient against disruption.

Managerial insights and practical implications
This research pays attention to predicting agri-food capacity production. Therefore, a novel ML approach is utilized for the first time. The robustness and resiliency concepts are combined in this approach. Robust scenariobased optimization is used to cope with an uncertain situation. This method applies flexibility based on disruption as a resiliency strategy. In addition, the proposed model is compared with other models to show the model's performance. The model's performance is suitable for forecasting agri-food capacity production. Eventually, it is suggested to managers and decision-makers of agri-food to use this style of mathematical model to predict volume production. This model helps the decision-maker to have better decisions.

Conclusions and outlook
This research proposes a new framework for agri-food capacity production by considering resiliency and robustness and paying attention to disruption and risk for the first time. A robust stochastic optimization is applied by adding robustness to the objective function and resiliency situation in constraint. This model minimizes a predicted linear function's MAD and standard deviation coefficient in agri-food production. This model is suggested to managers and decision-makers of agri-food to apply for forecasting production. This model help to improve the performance of decision maker.
Therefore, the results are as follows: 1. The main model is compared with other sine-type functions defined in section "Comparing with other functions". The amount of RRMAD and R-squared is determined in Table 5 and Fig. 7. As can be seen, the value of RRMAD is 1.28% less than other sine types. 2. This research varied conservativity coefficient in the range of 95-100%. When this factor increases to 100%, the RRMAD function decreases in Table 6, Fig. 8. When this factor increases by 5%, the RRMAD function will change by about −1.05%, and R-squared will fluctuate in Fig. 9, too. 3. In addition, confidence levels are varied between 1 and 5%. If it decreases, the cost function will change to up (cf. Table 7 and Fig. 10). By reducing it to 1%, the cost function grows 0.70%. As can be seen, R-squared is not changed significantly. 4. The weight factor is changed from 0 to 100%. When the significant factor is 0%, old data is more important than new data. When the weight factor is 100%, new data is more important than old data. If this coefficient grows, the RRMAD will increase, and R-squared will move down smoothly (cf. Figs. 11, 12, and Table 8).

R-Squared
Probably of scenario Probability of scenario R-Squared Figure 16. R-squared for the probability of scenario.